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CLAIMS 

A method for the segmentation of an audio stream into 
semantic or syntactic units wherein the audio stream is 
provided in a digitized format, comprising the steps 
of: 

determining a fundamental frequency for the digitized 
audio stream; 

detecting changes of the fundamental frequency in the 
audio stream; 

determining candidate boundaries for the semantic or 
syntactic units depending on the detected changes of 
the fundamental frequency; 

extracting at least one prosodic feature in the 
neighborhood of the candidate boundaries; 

determining boundaries for the semantic or 
syntactic units depending on the at least one 
prosodic feature. 

The method according to claim 1, wherein providing a 
threshold value for the voicedness of the fundamental 
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frequency estimates and determining whether the 
voicedness of fundamental frequency estimates is lower 
than the threshold value . 

The method according to claim 2, wherein defining an 
index function for the fundamental frequency having a 
value = 0 if the voicedness of the fundamental 
frequency is lower than the threshold value and having 
a value = 1 if the voicedness of the fundamental 
frequency is higher than the threshold value. 

The method according to claim 3, wherein extracting at 
least one prosodic feature in an environment of the 
audio stream where the value of the index function is 
equal 0 . 

The method according to claim 4, wherein the 
environment is a time period between 50 0 and 4 000 
milliseconds . 

The method according to claim 1, wherein the at least 
one prosodic feature is represented by the fundamental 
frequency. 

The method according to claim 1, wherein the extracting 
step involves extracting at least two prosodic features 
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and combining the at least two prosodic features. 

The method according to claim 1, further comprising 
first detecting speech and non- speech segments in the 
digitized audio stream and performing the steps of 
claim 1 thereafter only for detected speech segments. 

The method according to claim 8, wherein the detecting 
of speech and non- speech segments comprises utilizing 
the signal energy or signal energy changes, 
respectively, in the audio stream. 

The method according to claim 1, further comprising the 
step of performing a prosodic feature classification 
based on a predetermined classification tree. 

An article of manufacture comprising a computer usable 
medium having computer readable program code means 
embodied therein for causing segmentation of an audio 
stream into semantic or syntactic units, wherein the 
audio stream is provided in a digitized format, the 
computer readable program code means in the article of 
manufacture comprising computer readable program code 
means for causing a computer to effect: 

determining a fundamental frequency for the digitized 
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audio stream; 

detecting changes of the fundamental frequency in the 
audio stream; 

determining candidate boundaries for the semantic or 
syntactic units depending on the detected changes of 
the fundamental frequency; 

extracting at least one prosodic feature in the 
neighborhood of the candidate boundaries; 

determining boundaries for the semantic or syntactic 
units depending on the at least one prosodic feature. 

A digital audio processing system for segmentation of a 
digitized audio stream into semantic or syntactic units 
comprising : 

means for determining a fundamental frequency for the 
digitized audio stream, 

means for detecting changes of the fundamental 
frequency in the audio stream, 

means for determining candidate boundaries for the 
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semantic or syntactic units depending on the detected 
changes of the fundamental frequency, 

means for extracting at least one prosodic feature in 
the neighborhood of the candidate boundaries, and 

means for determining boundaries for the semantic or 
syntactic units depending on the at least one prosodic 
feature. 

An audio processing system according to claim 12, 
further comprising means for generating an index 
function for the voicedness of the fundamental 
frequency having a value = 0 if the voicedness of the 
fundamental frequency is lower than a predetermined 
threshold value and having a value = 1 if the 
voicedness fundamental frequency is higher than the 
threshold value. 

Audio processing system according to claim 12 or 13 , 
further comprising means for detecting speech and non- 
speech segments in the digitized audio stream, 
particularly for detecting and analyzing the signal 
energy or signal energy changes, respectively, in the 
audio stream. 



